Time-Varying Gaussian Process Bandit Optimization
نویسندگان
چکیده
We consider the sequential Bayesian op-timization problem with bandit feedback,adopting a formulation that allows for the re-ward function to vary with time. We modelthe reward function using a Gaussian pro-cess whose evolution obeys a simple Markovmodel. We introduce two natural extensionsof the classical Gaussian process upper confi-dence bound (GP-UCB) algorithm. The first,R-GP-UCB, resets GP-UCB at regular in-tervals. The second, TV-GP-UCB, insteadforgets about old data in a smooth fashion.Our main contribution comprises of novel re-gret bounds for these algorithms, providingan explicit characterization of the trade-offbetween the time horizon and the rate atwhich the function varies. We illustrate theperformance of the algorithms on both syn-thetic and real data, and we find the gradualforgetting of TV-GP-UCB to perform favor-ably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms signifi-cantly outperform classical GP-UCB, since ittreats stale and fresh data equally.
منابع مشابه
Material for “ Time - Varying Gaussian Process Bandit Optimization
t (x)2, as was to be shown. B Learning ✏ via Maximum-Likelihood In this section, we provide an overview of how ✏ can be learned from training data in a principled manner; the details can be found in [20, Section 4.3] and [6, Section 5]. Throughout this appendix, we assume that the kernel matrix is parametrized by a set of hyperparameters ✓ (e.g., ✓ = (⌫, l) for the Mátern kernel), and ✏. Let ȳ ...
متن کاملOn 2-armed Gaussian Bandits and Optimization
We explore the 2-armed bandit with Gaussian payoos as a theoretical model for optimization. We formulate the problem from a Bayesian perspective, and provide the optimal strategy for both 1 and 2 pulls. We present regions of parameter space where a greedy strategy is provably optimal. We also compare the greedy and optimal strategies to a genetic-algorithm-based strategy. In doing so we correct...
متن کاملTIME-VARYING FUZZY SETS BASED ON A GAUSSIAN MEMBERSHIP FUNCTIONS FOR DEVELOPING FUZZY CONTROLLER
The paper presents a novel type of fuzzy sets, called time-Varying Fuzzy Sets (VFS). These fuzzy sets are based on the Gaussian membership functions, they are depended on the error and they are characterized by the displacement of the kernels to both right and left side of the universe of discourse, the two extremes kernels of the universe are fixed for all time. In this work we focus only on t...
متن کاملLower Bounds on Regret for Noisy Gaussian Process Bandit Optimization
In this paper, we consider the problem of sequentially optimizing a black-box function f based on noisy samples and bandit feedback. We assume that f is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple ...
متن کاملGaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP...
متن کامل